Performance of KDB-Trees with Query-Based Splitting
نویسندگان
چکیده
While the persistent data of many advanced database applications, such as OLAP and scientific studies, are characterized by very high dimensionality, typical queries posed on these data appeal to a small number of relevant dimensions. Unfortunately, the multidimensional access methods designed for highdimensional data perform rather poorly for these partially specified queries. A potentially very appealing idea, frequently suggested in the literature, is to adopt a node-splitting policy that takes into account the “importance” of individual dimensions, which could be determined either a priori or through a statistical sampling of actual queries. This paper presents the results of some carefully controlled experiments conducted to observe the effects of query-based splitting on the performance of KDB-trees. The strategy is compared to a splitting policy that selects the split dimensions in a “cyclic” fashion, which has been shown to be very effective, especially in high-dimensional situations. Based on the results, the query-based splitting does not appear to be a very appealing splitting strategy for KDB-trees.
منابع مشابه
Improvement of Filtering Algorithm for RFID Middleware Using KDB-tree Query Index
RFID middleware collects and filters RFID streaming data gathered continuously by numerous readers to process requests from applications. These requests are called continuous queries. The problem when using any of the existing query indexes on these continuous queries is that it takes a long time to build the index because it is necessary to insert a large number of segments into the index. KDB...
متن کاملA Fast Algorithm for high-dimensional Similarity Joins
Many emerging data mining applications require a similarity join between points in a highdimensional domain. We present a new algorithm that utilizes a new index structure, called the -kdB tree, for fast spatial similarity joins on high-dimensional points. This index structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal cost of nd...
متن کاملHigh Dimensional Feature Indexing Using Hybrid Trees
Feature based similarity search is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high dimensional feature space which is indexed using a multidimensional data structure. Similarity search then corresponds to a range search over the data structure. Traditional multidimensional data structures (e.g., R-tree, KDB-tree, gr...
متن کاملLinear R-tree Revisited
The problem of finding an optimal splitting of overflowed nodes has a major influence on query performance of the R-tree spatial index structure. Most of the previous split heuristics of R-tree-based index structures have quadratic time and face the problem of increasing overlap of the resulting minimum bounding rectangles (MBRs). In this paper, we propose an efficient heuristic method for hand...
متن کاملA retrieval technique for high-dimensional data and partially specified queries
While the persistent data of many advanced database applications, such as OLAP and scientific studies, are characterized by very high dimensionality, typical queries posed on these data appeal to a small number of relevant dimensions. Unfortunately, the multi-dimensional access methods designed for high-dimensional data perform rather poorly for these partially specified queries. The retrieval ...
متن کامل